품질 / 상태

14

면적

5

욕실

4

연도

3

  • OverallQual
  • OverallCond
  • RoofStyle
  • ExterQual
  • ExterCond
  • Exterior1st
  • HeatingQC
  • GarageCond
  • BsmtCond
  • BsmtQual
  • KitchenQual
  • GarageQual
  • Foundation
  • PavedDrive
  • GrLivArea
  • TotalBsmtSF
  • GarageArea
  • WoodDeckSF
  • TotRmsAbvGrd
  • FullBath
  • HalfBath
  • BsmtFullBath
  • BsmtHalfBath
  • YearBuilt
  • YearRemodAdd
  • GarageYrBlt

카테고리형 결측치: None값 처리
ex) 지하실/차고가 없는 경우 “None” 처리
수치형 결측치: 0으로 처리
ex) 차고 없으면 0 채우기

범주형 변수 수치화
ex) ‘ExterQual’: {‘None’: 0, ‘Po’: 1, ‘Fa’: 2, ‘TA’: 3, ‘Gd’: 4, ‘Ex’: 5}
dummy코딩
RoofStyle’, ‘Exterior1st’, ‘Foundation’, ‘PavedDrive’

Text(0.5, 0, 'Coefficient Value')
Text(0.5, 1.0, 'LassoCV - Selected Feature Coefficients')
Text(0.5, 0, 'Alpha')
Text(0, 0.5, 'Number of Selected Features')
Text(0.5, 1.0, 'Change in the Number of Selected Variables by Alpha Value')

model 1 : [‘GrLivArea’, ‘OverallQual’, ‘TotalBsmtSF’, ‘GarageArea’, ‘YearBuilt’, ‘OverallCond’, ‘BsmtFullBath’, ‘ExterQual’, ‘BsmtQual’, ‘KitchenQual’]

model 2 : selected_features = [‘OverallQual’, ‘ExterQual’, ‘KitchenQual’ ,‘YearBuilt’, ‘BsmtFullBath’, ‘GrLivArea’, ‘TotalBsmtSF’ ,‘GarageArea’ ,‘WoodDeckSF’,‘RoofStyle_Hip’ ]

OLS Regression Results
Dep. Variable: SalePrice R-squared: 0.852
Model: OLS Adj. R-squared: 0.852
Method: Least Squares F-statistic: 1425.
Date: Thu, 24 Apr 2025 Prob (F-statistic): 0.00
Time: 22:17:48 Log-Likelihood: -29016.
No. Observations: 2482 AIC: 5.805e+04
Df Residuals: 2471 BIC: 5.812e+04
Df Model: 10
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept -4.544e+05 5e+04 -9.084 0.000 -5.52e+05 -3.56e+05
RoofStyle_Hip[T.True] 1.002e+04 1574.907 6.360 0.000 6927.648 1.31e+04
OverallQual 1.278e+04 746.795 17.110 0.000 1.13e+04 1.42e+04
ExterQual 1.453e+04 1709.012 8.499 0.000 1.12e+04 1.79e+04
KitchenQual 1.083e+04 1347.954 8.031 0.000 8182.549 1.35e+04
YearBuilt 165.1962 26.669 6.194 0.000 112.901 217.492
BsmtFullBath 1.396e+04 1217.527 11.463 0.000 1.16e+04 1.63e+04
GrLivArea 56.4054 1.603 35.180 0.000 53.261 59.549
TotalBsmtSF 32.2841 1.837 17.572 0.000 28.681 35.887
GarageArea 33.9268 3.698 9.174 0.000 26.675 41.179
WoodDeckSF 25.1514 4.761 5.283 0.000 15.816 34.487
Omnibus: 866.326 Durbin-Watson: 1.993
Prob(Omnibus): 0.000 Jarque-Bera (JB): 10081.930
Skew: 1.309 Prob(JB): 0.00
Kurtosis: 12.520 Cond. No. 2.36e+05


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.36e+05. This might indicate that there are
strong multicollinearity or other numerical problems.
OLS Regression Results
Dep. Variable: SalePrice R-squared: 0.854
Model: OLS Adj. R-squared: 0.853
Method: Least Squares F-statistic: 1443.
Date: Thu, 24 Apr 2025 Prob (F-statistic): 0.00
Time: 22:17:48 Log-Likelihood: -29002.
No. Observations: 2482 AIC: 5.803e+04
Df Residuals: 2471 BIC: 5.809e+04
Df Model: 10
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept -7.068e+05 5.9e+04 -11.970 0.000 -8.23e+05 -5.91e+05
GrLivArea 59.0755 1.590 37.151 0.000 55.957 62.194
OverallQual 1.182e+04 767.035 15.416 0.000 1.03e+04 1.33e+04
TotalBsmtSF 36.3921 1.886 19.298 0.000 32.694 40.090
GarageArea 36.0914 3.683 9.799 0.000 28.869 43.314
YearBuilt 279.1536 30.757 9.076 0.000 218.841 339.466
OverallCond 5718.6463 586.232 9.755 0.000 4569.090 6868.202
BsmtFullBath 1.467e+04 1204.983 12.177 0.000 1.23e+04 1.7e+04
ExterQual 1.48e+04 1702.489 8.691 0.000 1.15e+04 1.81e+04
KitchenQual 9208.5952 1354.144 6.800 0.000 6553.221 1.19e+04
BsmtQual 248.3778 999.315 0.249 0.804 -1711.203 2207.958
Omnibus: 977.532 Durbin-Watson: 1.998
Prob(Omnibus): 0.000 Jarque-Bera (JB): 11176.054
Skew: 1.533 Prob(JB): 0.00
Kurtosis: 12.933 Cond. No. 2.80e+05


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.8e+05. This might indicate that there are
strong multicollinearity or other numerical problems.
Text(0.5, 0, 'GrLivArea')
Text(0, 0.5, '')
Text(0.5, 0, 'OverallQual')
Text(0, 0.5, '')
Text(0.5, 0, 'TotalBsmtSF')
Text(0, 0.5, '')
Text(0.5, 0, 'GarageArea')
Text(0, 0.5, '')
Text(0.5, 0, 'YearBuilt')
Text(0, 0.5, '')
Text(0.5, 0, 'OverallCond')
Text(0, 0.5, '')
Text(0.5, 0, 'BsmtFullBath')
Text(0, 0.5, '')
Text(0.5, 0, 'ExterQual')
Text(0, 0.5, '')
Text(0.5, 0, 'BsmtQual')
Text(0, 0.5, '')
Text(0.5, 0, 'KitchenQual')
Text(0, 0.5, '')

이상치 제거 df[‘GrLivArea’] <= 3500 df[‘GarageArea’] <= 1200 df[‘TotalBsmtSF’] <= 2500

Text(0.5, 1.0, 'Distribution of Maintenance Score')
Text(0.5, 0, 'Score')
Text(0, 0.5, 'Count')
Text(0.5, 1.0, 'Sale Price vs Maintenance Score')
Text(0.5, 0, 'Maintenance Score')
Text(0, 0.5, 'Sale Price')
1
2
*등급 구간 별 분포**
*A등급 중 가격이 낮은 상위 10개 집**
*등급 별 분포 지도 시각화**
지역별 평균 MaintenanceScore
Text(0.5, 1.0, 'Neighborhood-wise Average Maintenance Score')
지역 별 가장 많이 분포된 유지보수 등급
등급 별 평균 SalePrice